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Abstract —The sparse Beyesian learning (also referred to as 
Bayesian compressed sensing) algorithm is one of the most pop¬ 
ular approaches for sparse signal recovery, and has demonstrated 
superior performance in a series of experiments. Nevertheless, the 
sparse Bayesian learning algorithm has computational complexity 
that grows exponentially with the dimension of the signal, 
which hinders its application to many practical problems even 
with moderately large data sets. To address this issue, in this 
paper, we propose a computationally efficient sparse Bayesian 
learning method via the generalized approximate message passing 
(GAMP) technique. Specifically, the algorithm is developed within 
an expectation-maximization (EM) framework, using GAMP to 
efficiently compute an approximation of the posterior distribution 
of hidden variables. The hyperparameters associated with the 
hierarchical Gaussian prior are learned by iteratively maximizing 
the Q-functlon which is calculated based on the posterior approx¬ 
imation obtained from the GAMP. Numerical results are provided 
to Illustrate the computational efficacy and the effectiveness of 
the proposed algorithm. 

Index Terms —Sparse Bayesian learning, generalized approxi¬ 
mate message passing, expectation-maximization. 

I. Introduction 

Compressed sensing is a recently emerged technique for 
signal sampling and data acquisition which enables to recover 
sparse signals from much fewer linear measurements 

y = Ax + w (1) 

where A G is the sampling matrix with M N, x 

denotes an TV-dimensional sparse signal, and w denotes the 
additive noise. Such a problem has been extensively studied 
and a variety of algorithms, e.g. the orthogonal matching 
pursuit (OMP) algorithm [1], the basis pursuit (BP) method 
[2], and the iterative reweighted £i and £2 algorithms [3], were 
proposed. Besides these methods, another important class of 
compressed sensing techniques that have received signihcant 
attention are Bayesian methods, among which sparse Bayesian 
learning (also referred to as Bayesian compressed sensing) 
is considered as one of the most popular compressed sens¬ 
ing methods. Sparse Bayesian learning (SBL) was originally 
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proposed by Tipping in his pioneering work [4] to address 
the regression and classihcation problems. Later on in [5], 
[6], sparse Bayesian learning was adapted for sparse signal 
recovery, and demonstrated superiority over the greedy meth¬ 
ods and the basis pursuit method in a series of experiments. 
Despite its superior performance, a major drawback of the 
sparse Bayesian learning method is that it requires to compute 
an inverse of an V x V matrix at each iteration, and thus 
has computational complexity that grows exponentially with 
the dimension of the signal. This high computational cost 
prohibits its application to many practical problems with even 
moderately large data sets. 

In this paper, we develop a computationally efficient gen¬ 
eralized approximate message passing (GAMP) algorithm 
for sparse Bayesian learning. GAMP, introduced by Donoho 
et. al. [7], [8] and generalized by Rangan [9], is a newly 
emerged Bayesian iterative technique developed in a mes¬ 
sage passing-based framework for efficiently computing an 
approximation of the posterior distribution of x, given a 
pre-specified prior distribution for x and a distribution for 
w. In many expectation-maximization (EM)-based Bayesian 
methods (including SBL), the major computational task is to 
compute the posterior distribution of the hidden variable x. 
GAMP can therefore be embedded in the EM framework to 
provide an approximation of the true posterior distribution of 
X, thus resulting in a computationally efficient algorithm. Eor 
example, in [10], [11], GAMP was used to derive efficient 
sparse signal recovery algorithms, with a Markov-tree prior 
or a Gaussian-mixture prior placed on the sparse signal. In 
this work, by resorting to GAMP, we develop an efficient 
sparse Bayesian learning method for sparse signal recovery. 
Simulation results show that the proposed method performs 
similarly as the EM-based sparse Bayesian learning method, 
meanwhile achieving a significant computational complexity 
reduction. We note that an efficient sparse Bayesian learning 
algorithm was developed in [12] via belief propagation. The 
work, however, requires a sparse dictionary A to facilitate 
the algorithm design, which may not be satished in practical 
applications. 


II. Overview oe Sparse Bayesian Learning 

We first provide a brief review of the sparse Bayesian 
learning method. In the sparse Bayesian learning framework, 
a two-layer hierarchical prior model was proposed to promote 
the sparsity of the solution. In the first layer, x is assigned a 
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Gaussian prior distribution 

N N 

p{x\a) = P[ p{xn\an) = Jl a~^) ( 2 ) 

n—1 n—1 

where a„ is a non-negative hyperparameter controlling the 
sparsity of the coefficient a;„. The second layer specifies 
Gamma distributions as hyperpriors over the hyperparameters 
{an}, i.e. 

N N 

p{(^) = P[ Gamma(a„|a, &) = P[ 

n=l n—1 

where r(a) = is the Gamma function. Besides, 

w is assumed Gaussian noise with zero mean and covariance 
matrix ( 1 / 7 )/. We place a Gamma hyperprior over 7 : p{'-f) = 
Gamma( 7 |c,d) = ■ 

An expectation-maximization (EM) algorithm can be de¬ 
veloped for learning the sparse signal x as well as the 
hyperparameters {a, 7 }. In the EM formulation, the signal x 
is treated as hidden variables, and we iteratively maximize 
a lower bound on the posterior probability p{a.,-f\y) (this 
lower bound is also referred to as the Q-function). Briefly 
speaking, the algorithm alternates between an E-step and a 
M-step. In the E-step, we need to compute the posterior 
distribution of x conditioned on the observed data and the 
estimated hyperparameters, i.e. 

p(a;|y,a(‘\ 7 (‘)) cx p(a:|a(*))p(y|a;, 7 ^*)) (3) 

It can be readily verified that the posterior p(x|y, 7 ^*^) 
follows a Gaussian distribution with its mean and covariance 
matrix given respectively by 

fj, 

$ =(7WA^A-f £))-i (4) 

where D = diag(a^*\ ..., The Q-function, i.e. 

E^ly ait)^^(t) [logp(a, 7 |y)], can then be computed, where the 
operator £'a;|y,a(‘), 7 (t) [■] denotes the expectation with respect 
to the posterior distribution p(a;|y, 7 *^*)). In the M-step, 

we maximize the Q-function with respect to the hyperparam¬ 
eters {a, 7 }, which leads to the following update rules 

(t+i) _ 2 a- 1 
{xl)+2b 
_ M + 2c-2 
{\\y - Ax\\l) + 2d 

where (•) denotes the expectation with respect to the posterior 
distribution p(a;|y, 7 *^*)). 

It can be seen that the EM algorithm, at each iteration, 
requires to update the posterior distribution p{x\y, 7 *^*)), 
which involves computing an TV x TV matrix inverse. Thus 
the EM-based sparse Bayesian learning algorithm has a com¬ 
putational complexity of order O(TV^) flops, and therefore is 
not suitable for many real-world applications with increasingly 
large data sets and unprecedented dimensions. We, in the 
following, will develop a computationally efficient sparse 
Bayesian learning algorithm via GAMP. 


III. Proposed SBL-GAMP Algorithm 
G eneralized approximate message passing (GAMP) is a 
very-low-complexity Bayesian iterative technique recently de¬ 
veloped [ 8 ], [9] for providing an approximation of the pos¬ 
terior distribution p(a:|y, 7 ^*)), conditioned on that the 

prior distribution for x the distribution for the additive noise 
w are factorizable. It therefore can be naturally embedded 
within the EM framework to provide an approximate posterior 
distribution of x to replace the true posterior distribution. 
Erom the GAMP’s point of view, the hyperparameters {q!, 7 } 
are considered as known and fixed. The hyperparameters can 
be updated in the M-step based on the approximate posterior 
distribution of x. 


A. GAMP 


GAMP was developed in a message passing-based frame¬ 
work. By using central-limit-theorem approximations, the 
message passing between variable nodes and factor nodes 
can be greatly simplified, and the loopy belief-propagation on 
the underlying factor graph can be efficiently performed. In 
general, in the GAMP algorithm development, the following 
two important approximations are adopted. 

Let 0 = {Q!, 7 } denote the hyperparameters. Eirstly, GAMP 
assumes posterior independence among hidden variables {xn} 
and approximates the true posterior distribution p{xn\y, 6) by 


piXn\y,fn,T}^,9) 


p{Xn\9)M{Xn\fn,T^) 

Ja;PiXn\dWiXn\fn,Tji) 


(5) 


where f„ and are quantities iteratively updated during 
the iterative process of the GAMP algorithm, here we have 
dropped their explicit dependence on the iteration number 
k for simplicity. Substituting © into ©, it can be easily 
verified that the approximate posterior p(xn,|y, f„, r^, 0 ) fol¬ 
lows a Gaussian distribution with its mean and variance given 
respectively as 


ti"" = 


1 -f q;„tT: 




( 6 ) 

(7) 


The other approximation is made to the noiseless output 
Zm = CL^x, where denotes the mth row of A. GAMP 
approximates the true marginal posterior p{zm\y, 0) by 


p{zm\y,p„i,T^,e) 


p{ym\Zm,d)M{Zm\Pni,T:^) 

!^P{ym\Zm,9)N{Zm\Pm,T^) 


where pm and are quantities iteratively updated during 
the iterative process of the GAMP algorithm, again here we 
dropped their explicit dependence on the iteration number k. 
Under the additive white Gaussian noise assumption, we have 
p{yTa\Zm,9^ — {yTa\Zm ,^/ThuS p{^Zni\y,Pni,X^^ O') 
also follows a Gaussian distribution with its mean and variance 
given by 


z ^T^iy-m+Pm 

1+7T^ 


1 + 'yxir 


(9) 

( 10 ) 
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With the above approximations, we can now define the 
following two important scalar functions: (?in(-) and gout(-) that 
will be used in the GAMP algorithm. In the minimum mean- 
squared error (MMSE) mode, the input scalar function gin{-) 
is simply defined as the posterior mean [9], i.e. 


Sin(r»,<,0) = nl 


fn 

1 + Q:„r^ 


( 11 ) 


The scaled partial derivative of 0) with respect 

to fn is the posterior variance </)“, i.e. 


■A. 

'dfr^ 


■gin(fn,Tn,0) = ((>n = 


1 -f a„Tr 


( 12 ) 


The output scalar function Pout(') is related to the posterior 
mean as follows 


goutiPmj 0) — p {Pm Pm) — 




1 + JTi 


Pm 

p Pm 


(13) 


The partial derivative of Pout(PmjA 
posterior variance (p^ in the following way 
d -tP 


m,i'm,0) is related to the 


'dp, 


~goutiPm, Tm, 0) — 


-l^m 
1 + ITn 


(14) 


Given definitions of Pin(') and Pout(')’ '^he GAMP algorithm 
can now be summarized as follows (details of the derivation 
of the GAMP algorithm can be found in [9]), in which Omn 
denotes the (m,n)th entry of A, Pn{k) and 4>n{k) denote the 
posterior mean and variance of at iteration k, respectively. 


GAMP Algorithm 

Initialization: given set fc = 0, sL = 0,Vm € 
{ 1 ,..., M}; {pUk)}Ai and are initialized as 

the mean and variance of the prior distribution. 

Repeat the following steps until |/i^(fc-|-l)—/i^(fc)P < e, 

where e is a pre-specified error tolerance. 

Step 1. Vm e {1,... ,M}: 

^mik) = ^ ^ O’mnPni.^) 

n 

Tmik) = ^al,nK{k) 

n 

Pm{k) =Zm{k) - TpP{k)Sm{k - 1) 

Step 2. Vto G {1,..., M}: 

Sm{k) =goutiPmik),T^ik),0^*'>) 

T'm(fc) =- -^9outiPm{k),T^{k),0^d'^ 

^Pm 

Step 3. Vn G {1,..., A^}: 

m 

Step 4. Vn € {1, -. -, A^}: 

(fc+ 1 ) =< (A) (f „ (fc), (fc), 0 (‘)) 

Output: {f„(fco),T;(fco)}, {Pm(fco),T^(fco)}, and {^^(fco + 
1 ), Pn{ko + 1 )}, where fco stands for the last iteration. 


We have now derived an efficient algorithm to generate 
approximate posterior distributions for the variables x and 
jz = Ax. We see that the GAMP algorithm no longer needs 
to compute an inverse of a matrix. The dominating operations 
in each iteration is the simple matrix multiplications, which 
scale as 0{MN). Thus the computational complexity can be 
significantly reduced. In the following, we discuss how to 
update the hyperparameters via the EM. 


B. Hyperparameter Learning via EM 

As indicated earlier, in the EM framework, the hyperpa¬ 
rameters are estimated by treating x as hidden variables and 
iteratively maximizing the Q-function, i.e. 

= argmaxQ(0|0(‘^) = E^^yf,(,t)[logp{0\x,y)] (15) 


We first carry out the M-step for the hyperparameters {an}- 
We take the partial derivative of the Q-function with respect 
to an, which yields 

[logp{0\x, y)] 

a” 


= [logp(a:„|a„)p(a„; a, b)] 


1 

20(71 

denotes the 


i^i) 


a — 1 


O'ri 


-b 


(16) 


where (•) denotes the expectation with respect to 
p{x\y,0^d'j. Since the true posterior is unavailable, we use 
p{xn\y,fn{ko),Tn{ko),0^^^), i.e. the approximate posterior 
distribution of Xn obtained from the GAMP algorithm 
to replace the true posterior distribution. Recalling that 
p{xn\y,fn{ko),Tn{ko),0^^'^) follows a Gaussian distribution 
with its mean and variance given by ©-(El), we have 


«) = 


(rn(fco))^ 


:iko) 


( 1 -f Q;i‘V;(fco ))2 1 + a^n^Tjl{ko) 

Setting ( fTfil l equal to zero gives the update rule for 
2 a- 1 


v(‘+l) 


VnG W} 


(17) 


(18) 


2b+{xl) 

We now discuss the update of the hyperparameter 7 , the 
inverse of the noise variance. Since the GAMP algorithm also 
provides an approximate posterior distribution for the noiseless 
output z, we can simply treat 2 ; as hidden variables when 
learning the noise variance, i.e. 

yd+i) = argmax£'^|j^e(t)[logp(y|2;,7)p(7;c,(i)] 


(19) 


Taking the partial derivative of the Q-function with respect to 
7 gives 

d 

[logP(yN> 7)p(7; c, d)] 

117 1 \ . . , 2 \ C — 1 

- 2 ^ - d 

^ m=l ^ 


( 20 ) 


where (•) denotes the expectation with respect to 

t.e. the approximate posterior 
distribution of Zm- Recalling that the approximate posterior 
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Fig. 1. (a). Phase transitions of respective algorithms; (b). Average run times Fig. 2. NMSEs of respective algorithms vs. the ratio M/N. 
vs. N. 


of follows a Gaussian distribution with its mean and 
variance given by dUl-lO, we have 

{{ym ~ Zm) ) = {Vm ~ fJ'm) 'Pm 


where fj.^ and are given by (l9ll-(fT0t. with {Pm,T^} 
replaced by {pm{ko), t^(A:o)}, and 7 replaced by 7 ^*^. Setting 
the derivative equal to zero, we obtain the update rule for 7 
as 


7(‘+i) = 


M + 2c-2 
‘2‘d + — Zm)"^) 


( 22 ) 


So far we have completed the development of our GAMP- 
based sparse Bayesian learning algorithm. For clarify, we now 
summarize our proposed SBL-GAMP algorithm as follows. 


SBL-GAMP Algorithm 

T Initialization: given and 7 *^°^ 

2. For t > 0: given and 7 *^*^ recall the GAMP 

algorithm. Based on the outputs of the GAMP algo¬ 
rithm, update the hyperparameters and 7 (^+ 1 ) 

according to (fTSl) and (l 22 l l. 

3. Continue the above iteration until the difference be¬ 
tween two consecutive estimates of x is negligible. 


IV. Simulation Results 

We now carry out experiments to illustrate the performance 
of the proposed SBL-GAMP algorithrrQ- In our simulations, 
the AT-sparse signal is randomly generated with its support 
set randomly chosen according to a uniform distribution. The 
measurement matrix A G is randomly generated with 

each entry independently drawn from Gaussian distribution 
with zero mean and unit variance, and then each column of 
A is normalized to unit norm. We compare our method with 
the conventional EM-based sparse Bayesian learning (referred 
to as SBL-EM) method [4] and the BP-AMP algorithm [ 8 ]. 

We first examine the phase transition behavior of respective 
algorithms. The phase transition is used to illustrate how 
sparsity level (K/M) and the oversampling ratio (M/N) affect 
the success rate of each algorithm in exactly recovering sparse 
signals in noiseless scenarios. In particular, each point on the 

'Codes are available at http://www.junfang-uestc.net/codes/SBL-GAMP.rar 


phase transition curve corresponds to a success rate equal to 
0.5. The success rate is computed as the ratio of the number 
of successful trials to the total number of independent runs. 
A trial is considered successful if the normalized squared 
error \\x — ^lH/lla^lli *1° greater than 10“®. Pig. [Tta) plots 
the phase transitions of respective algorithms, where we set 
N = 1000, and the oversampling ratio M/N varies from 0.05 
to 0.95. Prom Pig. [TJa), we see that, when M/N < 0.5, the 
proposed SBL-GAMP algorithm achieves performance similar 
to SBL-EM, and is superior to BP-AMP. The proposed method 
is surpassed by BP-AMP as the oversampling ratio increases. 
Nevertheless, SBL-GAMP is still more appealing since we 
usually prefer compressed sensing algorithms work under high 
compression rate regions. The average run times of respective 
algorithms as a function of the signal dimension N is plotted 
in Pig.Ulb), where we set M = OAN and K = 0.3M. Results 
are averaged over 10 independent runs. We see that the SBL- 
GAMP consumes much less time than the SBL-EM due to its 
easy computation of the posterior distribution of x, particularly 
for a large signal dimension N. Also, it can be observed that 
the average run time of the SBL-EM grows exponentially with 
N, whereas the average run time of the SBL-GAMP grows 
very slowly with an increasing N. This observation coincides 
with our computational complexity analysis very well. Lastly, 
we examine the recovery performance in a noisy scenario, 
where we set N = 500, K = 40, and the signal to noise ratio 
(SNR) is set to 20dB. Pig. |2] depicts the normalized mean 
square errors (NMSE) of respective algorithms vs. M/N. 
Results are averaged over 1000 independent runs. We see 
that the SBL-GAMP algorithm achieves a similar recovery 
accuracy as the SBL-EM algorithm even with a much lower 
computational complexity. 

V. Conclusions 

We developed a computationally efficient sparse Bayesian 
learning (SBL) algorithm via the GAMP technique. The pro¬ 
posed method has a much lower computational complexity (of 
order 0{MN)) than the conventional SBL method. Simulation 
results show that the proposed method achieves recovery 
performance similar to the conventional SBL method in the 
low oversampling ratio regime. 






























5 


References 

[1] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea¬ 
surements via orthogonal matching pursuit,” IEEE Trans. Information 
Theory, vol. 53, no. 12, pp. 4655^666, Dec. 2007. 

[2] E. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. 
Information Theory, no. 12, pp. 4203-4215, Dec. 2005. 

[3] D. Wipf and S. Nagarajan, “Iterative reweighted ii and £2 methods for 
finding sparse solutions,” IEEE Journals of Selected Topics in Signal 
Processing, vol. 4, no. 2, pp. 317-329, Apr. 2010. 

[4] M. Tipping, “Sparse Bayesian learning and the relevance vector ma¬ 
chine,” Journal of Machine Learning Research, vol. 1, pp. 211-244, 
2001 . 

[5] D. P. Wipf, “Bayesian methods for finding sparse representations,” Ph.D. 
dissertation. University of California, San Diego, 2006. 

[6] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. 
Signal Processing, vol. 56, no. 6, pp. 2346-2356, June 2008. 

[7] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algo¬ 
rithms for compressed sensing,” Proc. Natl. Acad. ScL, vol. 106, pp. 
18 914-18 919, Nov. 2009. 

[8] -, “Message passing algorithms for compressed sensing: I. motiva¬ 

tion and construction,” in Proc. Inf. Theory Workshop, Cairo, Egypt, Jan. 
2010 . 

[9] S. Rangan, “Generalized approximate message passing for estimation 
with random linear mixing,” in Proc. IEEE Int. Symp. Inf Theory (ISIT), 
Saint Petersburg, Russia, Aug. 2011. 

[10] S. Som and P. Schniter, “Compressive imaging using approximate mes¬ 
sage passing and a Markov-tree prior,” IEEE Trans. Signal Processing, 
no. 7, pp. 3439-3448, July 2012. 

[11] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture 
approximate message passing,” IEEE Trans. Signal Processing, no. 19, 
pp. 4658^672, Oct. 2013. 

[12] X. Tan and J. Li, “Computationally efficient sparse Bayesian learning 
via belief propagation,” IEEE Trans. Signal Processing, no. 4, pp. 2010- 
2021, Apr. 2010. 



